Search CORE

108 research outputs found

Performance Measurements of Supercomputing and Cloud Storage Solutions

Author: Arcand William
Bergeron Bill
Bestor David
Gadepally Vijay
Houle Michael
Hubbell Matthew
Jones Michael
Kepner Jeremy
Michaleas Peter
Monticiollo Paul
Prout Andrew
Reuther Albert
Samsi Siddharth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.Comment: 5 pages, 4 figures, to appear in IEEE HPEC 201

arXiv.org e-Print Archive

Crossref

Lustre, Hadoop, Accumulo

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Edwards Lauren
Gadepally Vijay
Hubbell Matthew
Kepner Jeremy
Michaleas Peter
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/07/2015
Field of study

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 10,000x lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing conference, Waltham, MA, 201

arXiv.org e-Print Archive

Crossref

Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Edwards Lauren
Gadepally Vijay
Hubbell Matthew
Kepner Jeremy
Michaleas Peter
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/06/2015
Field of study

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2015. arXiv admin note: text overlap with arXiv:1406.492

arXiv.org e-Print Archive

Crossref

Lessons Learned from a Decade of Providing Interactive, On-Demand High Performance Computing to Scientists and Engineers

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Gadepally Vijay
Houle Michael
Hubbell Matthew
Jones Michael
Kepner Jeremy
Klein Anna
Michaleas Peter
Milechin Lauren
Mullen Julia
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Siddharth
Yee Charles
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/03/2019
Field of study

For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively develop and share code projects across the globe. As the HPC community faces the challenges associated with guiding researchers from disciplines using high productivity interactive tools to effective use of HPC systems, it seems appropriate to revisit the assumptions surrounding the necessary skills required for access to large computational systems. For over a decade, MIT Lincoln Laboratory has been supporting interactive, on-demand high performance computing by seamlessly integrating familiar high productivity tools to provide users with an increased number of design turns, rapid prototyping capability, and faster time to insight. In this paper, we discuss the lessons learned while supporting interactive, on-demand high performance computing from the perspectives of the users and the team supporting the users and the system. Building on these lessons, we present an overview of current needs and the technical solutions we are building to lower the barrier to entry for new users from the humanities, social, and biological sciences.Comment: 15 pages, 3 figures, First Workshop on Interactive High Performance Computing (WIHPC) 2018 held in conjunction with ISC High Performance 2018 in Frankfurt, German

arXiv.org e-Print Archive

Crossref

Measuring the Impact of Spectre and Meltdown

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Gadepally Vijay
Houle Michael
Hubbell Matthew
Jones Michael
Kepner Jeremy
Klein Anna
Michaleas Peter
Milechin Lauren
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Siddharth
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/07/2018
Field of study

The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we measure the performance impact on several workloads relevant to HPC systems. We show that the impact can be significant on both synthetic and realistic workloads. We also show that the performance penalties are difficult to avoid even in dedicated systems where security is a lesser concern

arXiv.org e-Print Archive

Crossref

Benchmarking SciDB Data Import on HPC Systems

Author: Arcand William
Bergeron Bill
Bestor David
Brattain Laura
Byun Chansup
Gadepally Vijay
Houle Michael
Hubbell Matthew
Jones Michael
Kepner Jeremy
Klein Anna
Michaleas Peter
Milechin Lauren
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Siddharth
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/09/2016
Field of study

SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity hardware in a high performance computing (HPC) environment. In this paper, we present the performance of SciDB using simulated image data. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a cluster running the MIT SuperCloud software stack. A peak performance of 2.2M database inserts per second was achieved on a single node of this system. We also show that SciDB and the D4M toolbox provide more efficient ways to access random sub-volumes of massive datasets compared to the traditional approaches of reading volumetric data from individual files. This work describes the D4M and SciDB tools we developed and presents the initial performance results. This performance was achieved by using parallel inserts, a in-database merging of arrays as well as supercomputing techniques, such as distributed arrays and single-program-multiple-data programming.Comment: 5 pages, 4 figures, IEEE High Performance Extreme Computing (HPEC) 2016, best paper finalis

arXiv.org e-Print Archive

Crossref

Achieving 100,000,000 database inserts per second using Accumulo and D4M

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Gadepally Vijay
Hubbell Matthew
Kepner Jeremy
Michaleas Peter
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/06/2014
Field of study

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100x larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.Comment: 6 pages; to appear in IEEE High Performance Extreme Computing (HPEC) 201

arXiv.org e-Print Archive

Crossref